Double Q($\sigma$) and Q($\sigma, \lambda$): Unifying Reinforcement Learning Control Algorithms
نویسنده
چکیده
Temporal-difference (TD) learning is an important field in reinforcement learning. Sarsa and Q-Learning are among the most used TD algorithms. The Q(σ) algorithm (Sutton and Barto (2017)) unifies both. This paper extends the Q(σ) algorithm to an online multi-step algorithm Q(σ, λ) using eligibility traces and introduces Double Q(σ) as the extension of Q(σ) to double learning. Experiments suggest that the new Q(σ, λ) algorithm can outperform the classical TD control methods Sarsa(λ), Q(λ) and Q(σ).
منابع مشابه
Mini/Micro-Grid Adaptive Voltage and Frequency Stability Enhancement Using Q-learning Mechanism
This paper develops an adaptive control method for controlling frequency and voltage of an islanded mini/micro grid (M/µG) using reinforcement learning method. Reinforcement learning (RL) is one of the branches of the machine learning, which is the main solution method of Markov decision process (MDPs). Among the several solution methods of RL, the Q-learning method is used for solving RL in th...
متن کاملTwo Novel On-policy Reinforcement Learning Algorithms based on TD(lambda)-methods
This paper describes two novel on-policy reinforcement learning algorithms, named QV(λ)-learning and the actor critic learning automaton (ACLA). Both algorithms learn a state value-function using TD(λ)-methods. The difference between the algorithms is that QV-learning uses the learned value function and a form of Q-learning to learn Q-values, whereas ACLA uses the value function and a learning ...
متن کاملCoexistence and criticality in size-asymmetric hard-core electrolytes
Liquid-vapor coexistence curves and critical parameters for hard-core 1:1 electrolyte models with diameter ratios lambda = sigma(-)/sigma(+) = 1 to 5.7 have been studied by fine-discretization Monte Carlo methods. Normalizing via the length scale sigma(+/-) = 1 / 2(sigma(+)+sigma(-)), relevant for the low densities in question, both T(*)(c) ( = k(B)T(c)sigma(+/-)/q(2)) and rho(*)(c) ( = rho(c)s...
متن کاملQ Learning based Reinforcement Learning Approach to Bipedal Walking Control
Reinforcement learning has been active research area not only in machine learning but also in control engineering, operation research and robotics in recent years. It is a model free learning control method that can solve Markov decision problems. Q-learning is an incremental dynamic programming procedure that determines the optimal policy in a step-by-step manner. It is an online procedure for...
متن کاملLie ternary $(sigma,tau,xi)$--derivations on Banach ternary algebras
Let $A$ be a Banach ternary algebra over a scalar field $Bbb R$ or $Bbb C$ and $X$ be a ternary Banach $A$--module. Let $sigma,tau$ and $xi$ be linear mappings on $A$, a linear mapping $D:(A,[~]_A)to (X,[~]_X)$ is called a Lie ternary $(sigma,tau,xi)$--derivation, if $$D([a,b,c])=[[D(a)bc]_X]_{(sigma,tau,xi)}-[[D(c)ba]_X]_{(sigma,tau,xi)}$$ for all $a,b,cin A$, where $[abc]_{(sigma,tau,xi)}=ata...
متن کامل